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Introduction 



• I'm Dan Kaminsky 

- This is not a talk on DNS 

- This is not a talk on DNSSEC 

- This is not a talk on X.509 

- This is (mostly) not even a talk about offense 

• Though it'll happen 

• We are here to discuss defense 

- Why it's hard 

- Why it's still hard 

- How the Web is (still !) difficult to secure in very basic ways 

- What technical steps can be taken to make it easier for 
developers to secure their sites 



Warning 



This is one of the more ornery talks I have ever 
delivered. 

- I don't think Security has been sufficiently 
respectful of other engineering requirements. 

- I don't think randomized tokens in GETs or POST 
are a particularly scalable solution to XSRF 

- I'm not a big fan of regular expressions. Period. 
And you won't believe what I want to do with 
Base64. 



The Web Has Won 

HTML, JavaScript, and CSS are, by a wide margin, 
the most popular programming languages on the 
planet, at least for front end design 

- Bigger than Win32 

- Bigger than Objective C 

- Bigger than Java 

- Bigger than Flash 

Why install software when you can just browse to 
www.kayak.com ? 



The Web Has A Security Problem 

• Cross Site Scripting (XSS) and Cross Site Request 
Forgery (XSRF) are endemic 

• SQL Injection is not going away 

- And not just SQL - XML, JSON, even LDAP is in trouble 

• These are not new problems. These go back a 
decade. Why are we still suffering them? 

- After all, we've been saying for years, use randomized 
tokens in URLs, and 'Validate your input"! 



The Traditional Response 

'Developers are stupid and lazy." 

- This is a great response - for our egos 

- It defines hackers as intelligent and industrious! 



The Reality (as I see it) 

• It remains too difficult to make web browsers do 
surprisingly simple things 

- "Dan, I've been coming to Defcon for years. I love 
security But you know, it's $50K for me to build the 
site, and $150K for me to get somebody to come out 
and secure it. Do you know anybody who will pay 4x 
for me to build them a site?" 

- Our advice for how to secure things is really 
amazingly expensive 

• Expensive in terms of dev hours 

• Expensive in terms of outside validation 

- We think that's OK, because Security Is Important 



The Reality Of Randomized Token In 
URLs (to stop XSRF, and maybe XSS) 

http: //www. site. com/foo .php?token=298371231 

— (Or a hidden form element in POSTs) 

[All] application local URLs have to be rewritten for using the 
randomizer object. While standard HTML forms and hyperlinks pose 
no special challenge, prior existing JavaScript may be harder to deal 
with. All JavaScript functions that assign values to 
document. location or open new windows have to be located and 
modified. Also all existing onclick and onsubmit events have to be 
rewritten. Furthermore, HTML code might include external 
referenced JavaScript libraries, which have to be processed as well. 
Because of these problems, a web application that is protected by 
such a solution has to be examined and tested thoroughly. 

— Martin Johns, University of Hamburg, on "SessionSafe", a system which 
put randomized tokens in DNS subdomains - an easier place for them 
(thanks to Amit Klein for pointing me to this paper) 

— http://www.informatik.uni-hamburg.de/SVS/papers/ 
2006 esorics SessionSafe.pdf 

This is not good! 



And it's not like we're weak in really 

obscure places 

• XSS and XSRF are both failures of Session Management 

- Session management isn't exactly an obscure feature! It's an 
inherent aspect of any system that allows authentication! 

• Except the web wasn't originally designed as a particularly 
authenticated resource 

• All material was designed to be able to link to all other material 

• Most material was assumed to be flat text files in a directory 
somewhere 

- Session management's "canonical implementation" is cookies 

• Small blobs of data that, once added to a domain, are attached to 
each request to that domain 

- Cookies don't care from where a request came from 

• So, badguy.com can send you to goodbank.com, any URL, and mix his 
chosen URL with your authenticated credentials 

• This is the heart of XSS and XSRF 

• XSRF tokens are about sacrificing the automatic nature of cookies for 
manual integration of the token in each generated URI 



A Question 



It's 2010. Why is secure session management still 
such a hack? 

- XSRF tokens are painful and expensive to deploy (and only 
kind of work) 

• Tokens in URIs, either subdomain or GET param, leak via Referer 

headers and document. referrer, see Nathan Hamiel and Shawn 

Mover's Dynamic CSRF talk from BH2010 

- OK, yes, XSRF tokens could be made one-time use, or bound to the rest of 
the URL STILL 

- Cookies leak, even with "secure", even with "httponly" 

Answer: The security community hasn't asked for 
anything better. 

- Gut response: "See, this is why third party cookies should 
be shut off! Then, they'll have to use XSRF tokens!" 

• ''Who cares if it'll break the web? It'll still work. ..urn... mostly..." 



Our Message 



"It doesn't matter if 
"It doesn't matter if 
"It doesn't matter if 
"It doesn't matter if 



t's expensive." 

t's fragile." 

t breaks things." 

t doesn't even do a great job 



securing things." 

It's security! Obviously everybody has to do it! 

- And so instead of improving our technologies to the 
point where security is achievable, we spin our wheels 
demanding the deployment of immature defenses 



You may disagree. 



• If you think all our existing solutions are good 
enough, and the only problem is lazy and stupid 
developers ("insufficiently incentivized" if you're 
an academic), nothing I can say is going to 
convince you otherwise. 

• But if you agree, then your immediate question 
should be: What are you recommending? 

- It is not enough to curse the darkness. One must 
praise the light! 



First Law Of Securing the Web: 
You are not allowed to break the web 

• This is a point of significant contention! 

- Not just about cookies. 

• Remember "Mogul", the huge project centered 
around dealing with Marsh Ray's TLS 
Renegotiation vulnerability? 

• Remember how it involved an enormous IETF 
standardization effort around a fix, and a huge 
amount of work? 

• Did you notice all the patch announcements of 
crypto libraries applying the renegotiation fix? 



The Fix Is Off By Default In Firefox 

Just like I told Marsh Ray it would be. 

"Note that to benefit from the fix for CVE-2009-3555 
added in nss-3.12.6, Firefox 3.6 users will need to set 
their security.ssl.require_safe_negotiation preference 
to true. In Mandriva the default setting is false due to 
problems with some common sites." - Mandriva Patch 
Notes 

security.ssl.allow_unrestricted_renego_everywhere te 

mporarily_available_pref 

Current default value: DEPENDS, see end of section 
The development version of Firefox (3.7-pre) uses "false" 
The stable releases 3.5.9 and 3.6.2 use "true" 

As soon as a sufficient amount of servers had a chance to 
upgrade, the default in stable releases will be switched to 
"false", too 



Corollary: You Can't Afford To Wait For 

Everybody To Patch 

• "The Internet Has No Flag Days" 

- i.e. "OK everyone, on July 8 th , 2013, we will all 
simultaneously stop using Internet Explorer 6" 

• They needed to find a way to securely downgrade 
for servers that didn't support their fix. 

- They didn't (maybe they couldn't?), and ultimately 
it's now all lost effort. 

- When will enough internal servers patch? Never. 

- Hard but true. 



Second Law Of Securing The Web: 
Defenses must meet all engineering requirements 

• Classical security theory: "The attacker need only find 
one bug, while the defender must find them all/' 

- This is true, but incomplete. 

• Updated security theory: "The attacker need only 
consider one engineering requirement, while the 
defender must balance them all/' 

- If your defense is not fast enough, its not good enough 

- If your defense is not compatible enough, its not good 
enough 

- If your defense is not reliable enough, its not good enough 

- If your defense is not usable enough, its not good enough 

- If your defense is too hard to build or deploy, its will 
probably not be either built nor deployed 



Security Is (Still!) New 

Security is a new first class engineering requirement 

for software 

- Power efficiency is a new first class engineering 
requirement for consumer electronics 

- Just because the TV takes less electricity doesn't mean it's 
allowed to be any less pretty 

- Just because the code is more secure doesn't mean it's 
allowed to be slow 

There remains surprisingly basic work to be done to 
facilitate security 

- Building a secure session context is one of those things 

- Lets talk about how to do that, while not screwing 
everything else up 



So. How are we going to get a secure 

session context? 

First - why does a secure session context 
improve security? 

- What do Cross Site Scripting (XSS) and Cross Site 
Request Forgery (XSRF) have in common? 

• XS: Cross Site 



A Tale Of Two Classes Of Web Page 



BBC News: Needs Deep Linking 



Amazon.Com Shopping Cart: Needs 
Security From Outside Attackers 



I DIE BBC News - Flights resumeacross Eur... 



ij&/ BBC News - Rights resume across Europe as ash fears diminish - Mozilla Firefox 



Fi le Edit View History Bookmarks lools Help 



T C * f& ^ http://news,bbc,co,uk/2/hi/europe/8633451. Hfo T | |{1 T Google 

P Most Visited , , Getting Started Latest Headlines 



NEWS 



News Front Page 



TV 



rTEBBI ONE-MINUTE WORLD NEWS 



Page last updated at 17:02 GMT, Wednesday, 21 April 2010 13:02 UK 



3 E-mail this to a Friend 



fl Printable \ 



Flights resume as ash fear recedes 




Next Previous jk Highlight all \^\ Match case 



Transferring data from newsimg.bbc.co.uk.., 



■ Place Your Order - Amazon.com Checkout - Mozilla Firefox 



3 Place Your Order - Amazon.com Chec... 



£ile Edit View History Bookmarks lools Help 



;r ffl https://www.amazon.com/gp/flex; \J T | |^j T Google 

P Most Visited , , Getting Started Latest Headlines 



^ Do you want Firefox to remember the password for 
£■"" "dan@doxpara.com" on amazon.com? 



Remember Never for This Site 



amazon.com 



SIGN IN SHIPPING S PAYMENT GIFT-WRAP PLACE ORDER 



Please review and submit your order 

By placing your order, you agree to Amazon. corn's privacy notice and conditions of use. 



Review the information below, then click "Place your order.' 



Uj Place your order 



Shipping Details 



Order Summary 




Items; 

Shipping Ei Handling: 



Total Before Tax: 
Estimated Tax: 



£34.95 
SL0.9S 



545.93 
±4.07 



Shipping Options: (Le 



BTTggjJEa FREE Two- Day Shipping on 
i HTHi li T F this Order: Dan Kaminsky, 

you can save 510.93 on this 
I order bv selecting "FREE 



Order Total: $50.00 

Save on shipping! Select FREE Super 
Saver Shipping as your shipping speed, 
and we'll remove the shipping fees on 
the eligible items in your order. 



Next Previous >- Highlight all Q Match case 



On Boundaries 



• Most natural place to place boundary: Unauthenticated vs. 
Authenticated 

- Unauthenticated pages are the best landing points 

- Authenticated pages expose the greatest complexity 

• A strong session context prevents (Reflected) XSS and 
XSRF from executing at all 

- The attacker just cannot navigate to the endpoint with the bug 

- Making entire families of bugs unexploitable is always a good 
thing© 

- If even the unauthenticated landing points have sufficient 
complexity that they're likely to be XSSable, then www.foo.com 
can be 302 Redirected to public.foo.com or even 
www.foopublic.com. 



The Most Common Attempt: 
Server-Side Referrer Checking 

• HTTP requests can contain a "Referer" field, which is 
supposed to describe the URL of the page that sourced the 
request for a particular asset 

- Yes, it's misspelled in the standard 

• It is possible for an HTTP server to examine every request 
that comes in with an authenticated Cookie, and see if it 
also comes in with a Referer header from the same site 

• Many Content Management Systems have attempted to 
use Referer checking to stop XSRF and related attacks 

- Developers, finding a defensive technology that is easy and 
nondisruptive to implement, will actually do the work! 

• We tell them not to do this, for "Security Reasons" 

- "Referer headers can be spoofed using XMLHTTP and by 
using flash as demonstrated by Amit Klein and rapid7 and 
therefore cannot be trusted." 

• http://www.CRisecurity.com/csrf-faq.html 



But It's Not True! 

Am it et a I fixed this years ago 

There is no known mechanism for causing a 
browser to emit an arbitrary Referer header, 
and hasn't been for quite some time. 

- More importantly, if one is found, it's fixed, just 
like a whole host of other browser bugs 

So can we use this? 



The Real Reason You Can't Depend On 
Server-Side Referrer: Appcompat 

• There are many mechanisms that result in web 
browsers navigating from page to page 

- Follow an anchor 

- Change document. location. href 

- Follow a 302 redirect 

- Follow a <meta http-equiv="refresh"> link 

- Window.open 

• Referer header inconsistently attached - some 
methods have the header, some don't 

- Differs by browser 

- Really differs by plugin 



...and that's without the "security 

tools" butting in 

• http://codex.wordpress.orR/Enable Sending Referrers — 
how to make Wordpress administration work (at least for 
2. Ox) if you have any of the following installed: 

- Norton Internet Security, Norton Personal, NetBarrier, SyGate 
Firewall, Kerio Firewall 4, Zone Alarm Pro, Agnitum Outpost 
Firewall Pro 2008, McAfee, Privoxy, etc. 

• And that's to say nothing about network proxies like Squid, 
from which about one fifth of HTTP requests are sou reed 

• Most blocking is there to protect privacy interests or to 
protect against Dynamic XSRF attacks, which is reasonable, 
but it applies even to Referer headers for the same site 

— One security technology interfering with another security 
technology? Impossible! 



Fail Open Openly Failing 

• "Can't we just enforce XSRF protections //there's 
a Referer header, and allow the user in anyway 
(fail open) if the header is missing?" 

- This is actually the policy of a some CMS's in the field 



- Since there are navigation types that suppress Referer 
even on unfiltered hosts (<meta http- 
equiv="refresh">) they're exposed. 

• See also, HTTPS->HTTP Referer suppression, which is an 
intentional feature to prevent full URI leakage from the 
secure context (thanks David Ross, Sirdarckat) 



Can We Go Client Side? 



• Suppose there's too much weirdness going on at the HTTP 
layer to detect cross-site activity at the server 

- Not going to go into the Origin header, lets just say it's unusable 
in its present state 

• Suppose we could detect such behavior in the client. Could 
we use it? 

- "But we're pushing security to the client! We're not supposed 
to trust the client!" 

- It is the client that has the context of where the request really 
came from; the server is impoverished 

• Two problems 

- 1) How do we prevent execution of an XSS? 

- 2) How do we prevent navigation to an XSRF vulnerable 
endpoint? 



Preventing Script Execution Via Interpreter Suicide 



General idea: Preface all potentially attacker 
controlled HTML with a <script> that detects 
(we'll talk about how later) cross-site 
navigation 

If such is detected, we assume there might be 
something unsafe later in the the bytestream, 
and we have to prevent the Javascript/HTML 
interpreter from ever reaching it 



Methods For Suppressing Parsing Of 

Dangerous Content 

Renavigation: Assign document. location. href to a safe landing page 

- Fires an event to cause the window to navigate 

- Does not necessarily stop the existing parse thread 
Replacement: Assign document. body.innerHTML="" 

- Replaces the object into which content was being streamed 
Repetition: Prevent the script block from returning 

- Internal: while(l){ i=0; } 

- External: Add a dependency on a synchronous call that actually does block the 
interpreter for a nonzero amount of time 

Release: Call a nonexistent JS function, NonExistQ; 

- Brandon Creighton: VBScript will still run 

- Probably possible to inject HTML that runs an IFrame, which can restart the JS 
interpreter 

- Releasing the interpreter does not work: Too many ways to get it back © 

Realistically: Renavigate, then Replace. Can have all four though. 

- Yes, it'd be nice if we had a by-design method that did this 

- Window.stopf) does not 



Example 

<script> 

dest= "http://www.cnn.corrT ; 

document. location. href=dest; //Renavigate 

document. body.innerHTML="; //Replace 

while(document.body.innerHTML!="){ 

x++;} //Repeat 
alert(dest); // never reached 
</script> 
<script>alert(l);</script> NEVER REACHED 



Why This Works (and it does work) 

• <script> 
var foo=l; 
</script> 
<script> 
alert(foo); 
</script> 

• The first script block must return before the 
second script block is allowed to execute - in 
other words, all functionally correct JS 
interpreters must parse HTML blocks linearly 



What about XSRF? 

• Cross Site Request Forgery doesn't care about the 
reply - the reply can be ignored, it's the request 
(when mixed with the user's cookie) that's the 
source of pain 

- What if the reply to the REST endpoint was itself a 
batch of JS? And this batch contained a in-session 
challenge that, if passed, would result in a second 
query with a proper XSRF token? 

- What if we only did this, if the requestor's User-Agent 
was a browser? 

• Performance impact: lxRTT (One HTTP round 
trip) 



AppCompat of using HariKari.JS to stop 

XSRF 

• Compatibility impact: Some problems with XMLHttpRequest and 
Flash 

- Both look like browsers, since they "borrow" the browser's HTTP stack 

- Both are mostly same-domain only, with explicit opt-in required for 
cross domain activity 

- Neither will just automagically execute Javascript 

• Could add code to read back the challenge token (from a custom HTTP header, 
perhaps), but that's just as hard as adding a random token 

• Could add code to add a random header, but that's also just as hard as adding 
a random token (does save us lxRTT though) 

- Best possible today: Hijack the XMLHttpRequest object, such that all 
calls of object. send() are overloaded to add a randomized header 

• Should be a similar include possible with Flash's network stack 

• Just figured this out though, so it might not work 

• But what test could we use, to determine if we should cut 
execution? 



We can't use window.history © 



Window.history is an array 
that seems to contain all 
URLs that this particular 
window has browsed to 

It COULD contain all URLs of 
the same domain that are in 
history 

- We could see if the last page 
was from our own domain 

But instead reading back 
URLs is banned for security 
reasons © 




FFFFFFF 
FFFFFFF 
FFFFFF 
FFFUU 
UUUU 

uuuu 

/UUUU 
uuuu 
uuuu- 



What about document. referrer? 

Document. referrer is populated in the 
Browser DOM, and is read by JS, so nothing on 
the wire can suppress it 

However, support for it across navigational 
types is more inconsistent than you can 
imagine... 



Document. referrer: Beautiful, but fails 
compatibility across browsers 



1 1 


firefox3 chrome 


ieS opera 


<a h re f=> click 

docurnent.location.href change 
302 Redirect ^^^^_^^^^H 


1 1 
1 1 


^^^H 
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Meta Refresh 
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Window Open 


1 1 



Nice Try 



• ...no, you don't get to just say "We'll only allow <a href> 
links for navigating between pages". 

- Too many developers have too many things you'll have to watch 
for. 

- ...also, the list of tested navigators is not comprehensive 

- Changing window.location 

- Changing window.location from a foreign domain 

- Window.navigate 

- Etc. 

• Some possibility of overriding prototypes, such that 
changing document. location. href was transmogrified into 
an <a href> click event 

- However, wouldn't fix 302 or Meta 

- Also, prototype behavior is notoriously weird across browsers 



Standards 

Essentially, we have a lot of unspecified-by- 
any-standard behavior here, and it's 
preventing us from being able to use 
defenses we'd otherwise like to © 

- Might be able to enable certain defenses for just 
the browsers that always do the right thing, but 
right now that's only Opera © 



What We Really Want 

Isn't there some way we could say that a 
particular window was blessed with 
authenticated status, such that we weren't 
constantly checking for this magic chain 
keeping us within our own domain? 



What about Window.Name? 

Window.Name is a "cookie" that is specific to a 
Window, not to a Domain 

What if, on authenticating to a site, window.name 
received a magic value, that every other 
authenticated page was then programmed to 
check for? 

- XSS would suddenly stop working, because windows 
opened by the attacker wouldn't have the appropriate 
window.name value 

- XSRF would then JS challenge, and be stopped as well 

- Perfect, right? 



Alas... 

• Window.Name is preserved as a window browses 
across domain boundaries 

- Me to Kuza55: Tm trying to figure out how to get 
this particular character through an XSS filter/' 

- Kuza55 to Me: "Uh. eval(name) and get on with your 
life ©" 

• window.name (and to a lesser extent, window.opener) are 
the universal XSS carriers - put whatever you want in 'em, 
then eval. 

• Apparently Giorgio Maone came up with this trick, Kuza55 
doesn't remember though 

• So the attack... 



...is to navigate the victim window back to our 
own domain for name/token extraction 

• 1) Alternate tab / popunder gets a handle to the 
primary window via window.open or window 
opener 

• 2) Every so often, popunder navigates the 
window to its own domain, granting it access to 
the foreign window.name 

• 3) Popunder runs window.backQ on foreign 
window to put it back where it's supposed to be 

• Probably more variants that aren't quite so 
distracting (popunder sends user to their bank, 
choosing it via CSS history hacking) 



Meta-Note 

Just because you're worrying about 
performance, compatibility, reliability, and 
ease of use, doesn't mean you don't have to 
keep trying to pwn your own defenses 

- The world has enough defensive technology that 
doesn't work, thank you very much © 



Can HTML5 help? 



The sessionStorage attribute of the window 
object maintains key/value pairs for all pages 
loaded during the lifetime of a single tab (for the 
duration of the top-level browsing context)... 
When another document is loaded, 
sessionStorage is initialized from memory for 
same-origin URLs. 

- MSDN 

This is a key/value pair that is both domain 
specific and window (or tab) specific 

This is awesome! This is exactly what we need! 



...almost © 



Foreign sites can still get a handle to our window 

- Window.open to a named window 

- Taking the return value of an initial window.open 

- Caching the handle from a popunder's window.opener 
With a handle to our window, he can still navigate us 

- Although he can't extract a token, window.name style, he can still 
interpose his requested URI with our magic sessionStorage object © 

Also, if we ever randomly browse from our site to a foreign site, the 
foreign site can send the user back to us with a URI of the foreign 
site's choosing 

- Sessionstorage is attached to the window, and does not expire just 
because the window went elsewhere 

- This is to support SSO use cases 



Towards A Better Ask 



2010, complete with a sessionStorage attribute, 
and we still do not have a straightforward way 
to establish a locked session against which 
outside interlopers cannot inject URLs that will 
be retrieved with the user's credentials 

- XSRF tokens are empirically neither reasonable nor 
easy - and anyway, they leak 

- A variant of sessionStorage is probably fixable, and Ian 
Hickson and I have been working to find something 
the meets operational requirements 

• There's all sorts of weirdness with supporting SSO systems 
and multiple tabs 

* Until we admit we have a problem with what we have, we 
cannot begin to make better asks. 



And what would that ask be? 



if(window.newSession){ atomicNavigate('login.php')} 

- newSession==l if and only if this page was reached as the result of a local nav event 

• Must keep plugins in mind! They nav us too! 

• Bookmark navigations should always be seen as local, as would nav from anything that passed 
document. domain checks 

- atomicNavigate('login.php') guarantees a parser nuko halt 

SSO support would probably be necessary as well 

- document. sessionDomains as an array of domains allowed to source nav events without 
invalidating a session 

- This is larger than document. domain because it has to support out-of-bailiwick domains 

Might be interesting to standardize "marking" of links on same-site that ought not 
be trusted 

- Have to manage 302 "returns" 

- Origin header did a lot of work with this, which unfortunately left the standard really wonky 

• We might be able to do better inside of the client 

Until then. ..we have interpreter suicide, but no good way to detect session creation 

- If you're creating something from scratch, yes, you could go 100% document. referrer and use 
only anchor links. But never 302. 



nput Validation: Another Realm That 

Could Use A Reboot 

• To paraphrase: "A man has a problem with a parser, 
and puts a regular expression filter in front of it. Now 
he has two problems/' 

• Bobince's Law: "Every time you attempt to parse HTML 
with Regular Expressions, Russian hackers pwn your 
webapp." 

- Bobince from Stack Overflow ( 

http://stackoverflow.com/questnons/1732348/regex- 

match-open-tags-except-xhtml-self-contained-tags/ 

1732454#1732454 ) 

• It is generally understood to be foolish to try to parse 
something with the complexity of this... 



Tree graphs for www.wired.com and 



www.cnn.com 








71 *\ 




Punctuation Considered Harmful? 



...with this: /(<a\s(?:[ A >](?!href))*href\s*)(&(&[ A ;] 
+;)?(?:.(?!\3))+(?:\3)?)([ A >]+>)/ 

- Curtis Poe, Author: "I was strutting like a peacock 
when I wrote that, followed quickly by eating crow 
when I ran it. I never did get that working right. I'm 
still not sure what I was trying to do/' 

- ...and that's not even trying to validate an IP address, 
like this is: 

• \b(25[0-5] 1 2[0-4][0-9] | [01]?[0-9][0-9]?)\.(25[0-5] 1 2[0-4] 



0-9 



0-4 
0-9 



0-9 



[01]?[0-9] 



0-9]?)\b 



[01]?[0-9][0-9]?)\.(25[0-5]|2 
;0-9]?)\.(25[0-5] 1 2[0-4][0-9] | [01]? 

Was there a reason we thought this was a good 
idea? 



Well, hang on 



• I'm being ornery, but I'm not being that ornery 

• Many uses of regular expressions for input validation 
aren't about implementing complex rules for input 

- As the ha.ckers.org cheat sheet shows, this does not work. 

• It is a necessary procedure for many systems to be able 
to integrate (potentially attacker supplied) user 
substrings into queries and commands otherwise 
written by the programmer 

- The challenge is to keep user substrings from being parsed 
as attacker content 

- RegEx's are used to escape characters that might 
otherwise be parsed as node terminators 



The Truth 

• As the profusion of SQL/XML/JSON/LDAP 
injections shows, this isn't working very well 

- Parser drift doesn't just apply to HTML parsers - once 
you start including the finer points of Unicode and n- 
tier traversal, it really is quite tricky to know what's 
going to exist at the end of a pipeline 

- Just as XSRF defenses struggle with needing to be 
dotted across the codebase, so too do injection 
defenses 

• Are there other approaches we could try? 



Alternatives 



Parameterization 

- Provide user data through a different channel than programmer code 

Gold standard 

Works great 

Doesn't traverse n-tiers well 

Requires change in coding style 

Can have issues being deployed (require permission from "priesthood of DBAs" to 
deploy) 

Pascal Strings 

- Prepend a string with a length 

- Leads inevitably to overlapping length fields, which are just going to get you 
compromised 

• External lengths lead to malloc 
Internal length lead to memcpy 
memcpy leads to suffering 

Taint tracking 

- Bytes are marked with their source 

- Taint is allowed to spread 

- Doesn't Hasn't yet been made to scale 



What About Randomized Terminators? 



Used (reasonably effectively) in MIME 

=_NextPart_002_0186_01C89653.D38E9D10 Content 

Type: text/html; charset="Windows-1252" Content- 
Transfer-Encoding: quoted-printable 

• • • 

=_NextPart_002_0186_01C89653.D38E9D10— 

- Nestable 

Rather than dotting a string with backslashes and 
escape codes all over the place, we have a single, fairly 
large escape sequence at the beginning and end of 
attacker controlled content 

- This is portable to other protocols 

- I refer to this as "treelocking" 



(Implicit) Treelocking for XML 



<foo> 
<evil> 

XXX 

</evil> 
</foo> 

<foo> 
<evil> 

<_treelock_EYKPRZEJ2LKF55UJ> 

XXX 

</_treelock_EYKPRZEJ2LKF55UJ> 

</evil> 
</foo> 

In this situation, the attacker can inject whatever string he likes - as 
long as he never discovers that the escape sequence on this 
particular submission is EYKPRZERJ2LKF55UJ, he is stuck inside the 
<evilx/evil> tree node 



Handling Treelocked Data 



• The DOM/Slow way 

- Create a function called "expander" and run it on a parsed (but 
not yet schema validated) XML document 

- For each node, if the contents begin with treelock, then: 

• Create a false prefix and suffix for a valid XML document 

• Create a bytestream composed of the prefix, the full text body inside 
the XML node, and the suffix 

• Parse that bytestream into a new XML document 

• Run expander on that document 

• Replace the treelock node with the contents inside the expansion 

• The above is not fast, but it does work 

- There are almost certainly faster, SAX based ways to implement 
the above 



Caveats 



Requires a parser that cares where in the parse tree sensitive 
values show up 

- <book> 

<title lang="eng">Harry Potter</title> 
<price>29.99</price> 
</book> 

- An xpath search for /book/title will return all titles within books. An 
xpath search for //title will return all titles, whether or not they are in 
a book. 

• xml.findAIIC'title") also fails 

- Treelocking cannot save you if you do not care if <evil> provides you 
sensitive system data 

- Schemas con save you, unless <evil> is allowed to house arbitrary XML 
(which it would, if it's intended to be opaque at your layer) 

- This is potentially a problem for SAX parsers and naive XML 
implementations 

- Look for this on pen tests that allow you to submit XML, or even 
those that accept plaintext but dump it into a n-tier web backend 



What about JSON? 



JSON, unlike XML, doesn't have strings that need to match 
between openers and closers 

The naive approach would be to use JavaScript comments 
to wrap the section 

- {"foo": { 

//TREELOCK:12341 

"evil":"xxx" 

//END_TREELOCK:12341 

} 
} 

- Problems 

• 1) JSON doesn't actually support comments! They're afraid people 
will use them for something significant, (heh.) So parsers that don't 
support comments will fail. 

• 2) Treelock enforcement won't be mandatory, like it would be for XML. 
Even if Treelock constraints aren't met, parsing will work anyway. It 
fails open. This is not the best way to build a security technology. 



Towards Explicit Treelocking 



Original thought: Lets encrypt chunks of the tree with a key, and then 
place the key next to a the Base64 encoded portion of the ciphertext 

- Base64 is a way to map three arbitrary bytes, to four "safe" bytes a-zA-Z+/= 

- {"foo":{ 

"_treelock_0": { 
key: "12341", 

locked_data: "abcdabcdabcd" 
} 
} 
Then I realized: No actual value to the crypto! 

- With Base64, it was now impossible to escape a particular region of the tree - 
no way to emit terminators 

- Without Base64, random noise would be guaranteed to eventually generate 
terminators 

- The value came not from encrypting, but from stripping access to otherwise 
necessary bits 

• Once again, no point to encrypting data when 
the key is right next to the encrypted data. 
Ahem. Everyone, please stop doing this. 



...and Base64 is pretty fast. 

$ dd if=/dev/u random count=1000024 
bs=1024 | cat>/dev/null 
1024024576 bytes (1.0 GB) copied, 25.109 s, 
40.8 MB/s 

$ dd if=/dev/u random count=1000024 
bs=1024 | base64>/dev/null 
1024024576 bytes (1.0 GB) copied, 24.499 s, 
41.8 MB/s 



...and while it expands data... 

...we get most of the expansion back post- 
compression 

- $ Is -I test test.b64 test.b64.gz 
-rw-r-r- 1 dan None 1048576 2010-04-18 07:09 
test 

-rw-r~r-- 1 dan None 1416501 2010-04-18 07:10 
test.b64 

-rw-r-r- 1 dan None 1076498 2010-04-18 16:25 
test.b64.gz 

• (with some caveats) 



So, JSON can be handled much like 
XML, but adding a de-base64 phase 

Encoded: 

- {"test":{ 

"_treelock_0": 
"lmV2aWwilDoglnRvbm90b25vlgo= " } 

Decoded 

-{"test":{ 

"evil" : "tonotono" 

} 



...and this approach can be used for 
other protocols, like LDAP and XML... 

• LDAP: ldap://localhost:389/ 
_PL=ABCDABCDABCDABCDABCDABCDABCDABCD 
->ou=ABCDABCDABCD,o=ABCDABCDABCD 

-> ou=People,o=JNDITutorial 
- Note the intentional use of multiple rounds of Base64 

• XML: 
<foo> 

<evil> 

< treelock_0> 

yru9BZ8AEIqm 

</ treelock> 

</evil> 

(yes, I know a CDATA section would theoretically be more XML-y) 



And SQL too. 

select count(*) from foo where x=Y or T=T; 
3 

select base64_encode("x' or T=T'); 
eCcAb3IAJzEnPScx 

select count(*) from foo where 

x=base64_decode("eCcAb3IAJzEnPScx"); 





Alternate Encoding Scheme Discussion 
(I'm sure this isn't the first) 

• "I think that easy way to protect against SQL injection is 
to convert inputted data into binary format, so that 
whatever input is, in sql query it will consist only of Is 
and Os." 

- Dark.avenger@email.cz , 14-Aug-2008 

• "If there IS a 1-to-l correspondence, then EITHER your 
solution only makes it a bit harder to perform a SQL 
injection (a hacker would have to figure out what 
mapping was used between the text and the 'binary' 
format), OR you've come up with simply another way 
to escaping your data/ 7 

- iaimthorn@yahoo.com, 13-Oct-2008 



Base64: THE EPIC BANHAMMER OF 
ULTIMATE ESCAPE (with one minor flaw) 

select countf*) from foo where x=base64_decode("eCcAb3IAJzEnPScx"); 
You know what eCcAb3IAJzEnPScx is not? 

- SQL XML. JSON. LDAP. 

- So we're type-safe going into base64_decode() 

The response from base64_decode is always going to be treated as a 
String Literal in the parse tree of the SQL interpreter itself 

- Now, attacker strings are constrained to being just strings, by the engine itself 
- not subqueries, not extra lookups, nothing 

- So we're type-safe coming out of base64_decode() 

- (There might be databases that screw this up, but I haven't seen them) 

- Note how in this model, instead of hoping that the right side of x= is an 
escaped string, the DBA knows it and everything breaks if it is not. Base64 
fails closed! 

Escaping is an attempt to, on a character by character basis, special case 
anything that might alter the parse tree 

- Effectively, this is a blacklist 

- How hard is it to know what to put on this list? 



Unicode Will Cut You 



Last year, Moxie Marlinspike and I independently broke 
most X.509 implementations using a NULL terminator 

- "Great! Everybody should know to use a RegEx to alter or 
alert on NULLs!" 

- What I didn't say last year, was that there was at least one 
other character that I could have used 

• OxCO 0x80 - A UTF-8 overwide NULL, vs. CryptoAPI, also 
terminated X.509 Common Names 

• You have to mark a special flag to ban overwide characters in 
Windows Unicode, and they didn't. 

- And there are up to four other byte sequences that might 
also get parsed as a NULL 

- To say nothing about "best fit matching", which can be 
different at every tier 

- The problem with escaping is that you are, in fact parsing 
- and if when you drift from the real parser, you fail 



Related Work 



There is some really cool related work done by Meredith Patterson 
called Dejector, which attempts to run a second SQL parser in front 
of the first one so as to operate filters there 

- This works well - especially as a way to execute intelligent filtering and 
scrubbing! 

- Dejector's filters, integrated as a plugin to the actual parse tree used 
by the actual database, would probably be the best of both worlds 

- http://www.thesmartpolitenerd.com/code/deiector.html 

GreenSQL seems to be a productized version of the SQL-parser- 
before-the-SQL-parser design; unsure if it's a filter or a scrubber 

- Has had some problems in the past re: parser drift, unfortunately 

- Thanks Kuza55! 

The primary difference of this approach is that it is aggressively 
trying to find a better interface to/for existing parsers 



What was that minor flaw? 



Base64 DOES allow / and + and = 

- These are meaningful in many grammars 

- They're no problem if you quote the entire section, but we can expect 
people will be so enamored of expressing things in this 
"encapsulated form" that they will forget to quote 

• KNOW HOW PEOPLE WILL FAIL WITH YOU, OR PEOPLE WILL FAIL WITH YOU 

One Solution: Base62 

- 90: 9 
91: + 
92:/ 
93: = 
94:== 
95: === 

- Nonstandard (and just hacked up) 
Another solution: Base32 

- Maps 3 bytes to 5 characters 

- Standard 

- Less efficient 



mplementation Notes 



Where to get a Base6*_Encode/Decode function? 

- Stored Procedure 

• http://wi-fizzle.com/downloads/base64.sql 

• Not wildly fast 

• Embeds into database 

• May require DBA permission 

- UDF (User Defined Function) 

• MySQL extension, written in C 

• Very fast, though not as fast as an integrated function 

• DEFINITELY requires DBA permission, and is less portable 

• Need to write a good one 

- Integration into DB itself 

• Surprisingly, doesn't already exist 

• Potential for much greater support across the parse tree 

• Would be much faster 



Caveats 



• Stored functions and UDFs are somewhat limited in 
what they can protect 

- Pretty much only the right side of comparison arguments 
can be the result of a computation (at least in MySQL) 

- SELECT * from Base64_Decode("Zm9vCg==") doesn't work 

- If Base* functionality was integrated into the parser itself 
(as I plan to do at some point) then typing would need to 
be implemented 

• Right now, we only support one type 

• If we supported multi-type, we might be pushed into emitting 
something like a subquery 

• Performance is still a major issue, particularly with the 
stored procedure 

- Even MySQL UDFs are pretty slow - note the actual 
"cipher" isn't a problem, only the context switch 



Safety 



• While Base* is about the simplest and safest bitstream 
transformation you can imagine, people still manage to screw it up 
pretty royally 

- base64decode(char *ptr) { 

for(...) { 
lookuptable[*ptr++]; <-- negative indexing of ptr 

"} 

* * * 

} 

- What happens when the byte at ptr isn't one of the 64 valid Base64 
characters? *sighs* 

• So you do need a safe Base64 implementation - either check the 
value at *ptr, or have lookuptable[] be 256 bytes 

• Also some safety issues if network-based filters don't "follow along" 
and unpack the Base64 



The Challenge: Debuggability 



Anyone who's ever debugged an XML-based protocol 
(WS-*, ahem) with Base64-encoded blobs in CDATA 
sections, from a basic text editor, knows what I'm 
talking about 

- There are entire protocols that nobody has ever entirely 
decoded, because to do so would require key after key 
after key which of course notepad.exe can't be expected to 
deal with 

- The hope of proposing unkeyed Base64 wrapping as the 
one true escaping mechanism is that viewers and 
debuggers can be modified to expect to be able to inline- 
decode Base6* content with nothing more than the simple 
algorithm 



The Hook: Ease Of Use For The 

Developer 

Developers like string interpolation for porting 
variables into SQL, irrespective of security 

- Interpolation: $query = "select * from $table where 
fname= , $fname' and country='$country , ; M ; 

Surprisingly, they'll go to some interesting lengths 
to keep their variables near the rest of the query 

- Concatenation: query = "select * from " + table + 
11 where fname = '" + fname + "' and country= m + 
country + m ;"; 

- Conjecture: Fitt's Law (from the Ul world) applies to 
programmers - the closer data is to the code that 
operates on it, the faster the code can be written 



From 

What we've been advising: 

- $table = escape($table); 
$fname = escape($fname); 
$country = escape($country); 

■ ■ ■ 

$query = "select * from $table where 
fname= , $fname , and country='$country'; n ; 

• ... can be a very long distance (as every static analyzer 
developer has realized) 

• If it were to happen that those escapeQ's were 
removed, nobody would notice 



To 



What we could advise 

- $query = "select * from bd($!table) where fname=bd($!fname) 
and country=bd($!country);" 

• We could enrich interpolation syntax by adding inline Base* support 

• Couldn't do this with escaping, because escapes were too specific 

• THERE ARE MANY WAYS OF DOING THIS, MOST OF WHICH ARE BAD. 
But the core idea seems clearly good - it's a rough form of marking 
type 

- Happy programmer 

• Who needs to escape anyway, to support Unicode and names with 
apostrophes in them properly 

- Happy security engineer! 

• Maybe even happy DBA, who can enforce use of Base64 on all 
incoming String Literals 



Summary Of Attacks 



Most of this talk has been about how to protect networks, 
but you can't design effective defenses without strong 
offensive knowledge 

- While there's no easy way to spoof Referer, there are many easy 
ways of suppressing it (META refresh, HTTPS->HTTP, plugins). 
Against XSRF systems that fail Referer checking open (as some 
CMS's do), this is a general and permanent bypass 

- Overwide UTF-8/16 NULLs may still terminate, especially (but 
not exclusively) on Windows 

- Maone's eval(name) is probably the single best way to 
implement a reflected XSS attack 

• Everybody should know about it, so I'm just going to cite it. 

- XML (and possibly JSON) parsers that implement tree search 
may allow elements to be retrieved out of their expected (and 
secured) location in the tree 



Defensive Summary [0] 

• Defense is hard. 

- It requires being aware of a whole range of issues, 
most of which have nothing to do with the driving 
problem at hand 

- If you are not obsessed with the entire field of 
engineering requirements, you cannot hope to 
improve things 



Defensive Summary [1] 



Session handling for browsers remains embarrassingly 
weak 

- It's possible to guarantee strong binding between requested 
URLs and active credentials, but only through a method that 
requires developers to manually integrate a randomized token 
into every single navigation event they execute 

• Even this "gold standard 7 ' leaks via Referer, though it's theoretically 
possible to handle this leakage by one time tokens or hashing against 
the rest of the URI 

- While it's possible to suppress execution of Javascript and HTML 
after a chosen point in a HTTP bytestream ("Harakiri.JS"), there 
is no way to reliably trigger this suppression on cross-domain 
events 

- If we can ever get session handling to be strong, entire families 
of XSS and XSRF attacks will disappear behind a solid and 
reliable wall of authentication 



Defensive Summary [1] 

Treating languages as parse trees allows us to 
constrain injections across a wide variety of 
protocols, including XML, JSON, LDAP, and SQL 

- Using Base* as an impoverished, explicit, and 
universal escaping mechanism has subtle but 
remarkably robust advantages 

- Not only would web sites be more secure, but 
they'd have better support for good Irish names 
like O'Malley 



